Module 1, Lecture 3: Initiating R and RStudio

M Hallett
January 2015

R, RStudio, GIT repositories for the course.

COMP-364 Tools for the Life Sciences

The R Project (www.r-project.org)

R homepage

The R Project

What is R?

  • R is a language and environment for statistical computing and graphics.
  • It is a GNU project which is similar to the S language developed at Bell Laboratories by Jo hn Chambers et al.

  • R and it's assocaited packages togetherprovide a wide variety of functionalties:

    • linear and non-linear modelling,
    • classical statistical tests,
    • time-series,
    • classification,
    • clustering, graphics, …. and it is highly extensible.





Installing R on your machine

(if you want to… not necessary but sometimes easier)

Available @ CRAN (Comprehensive R Archive Network) Vers 3.1.2 R version 3.1.2 (2014-10-31) -- "Pumpkin Helmet"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin10.8.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
>

Manuals and other learning aids

RStudio: tools that make R easier to use

RStudio

RStudio

  • 4 main windows
    • top left: your code
    • bottom left: an R session where you can run your code
    • top right: details of your R environment (defined variables, datastructures, history, … )
    • bottom right: input/output (access to your files, plots that you produce, help, …)

RStudio

Starting RStudio (1)

Step 1 [Mac or other Unix machine]

  • On a Mac, open a terminal window (look under Applications).

Step 1 [PC]

  • Download PuTTY: \( {\tt http://www.chiark.greenend.org.uk/\tilde{}sgtatham/putty/download.html} \)

  • You need the development version v. 0.64 as there is a bug in the stable release.

Starting RStudio (2)

Step 2 [Mac or Unix]

  • Create a “tunnel” between your machine and the RStudio server by typing the folloiwng into the Unix terminal window.

ssh -f <socs_username>@rstudio.cs.mcgill.ca -L 8787:rstudio.cs.mcgill.ca:8787 -N

(You don't really need to understand this step.)

RStudio: Connecting to a remote R server

Remote Server

Starting RStudio (3)

Step 3

  • Open your web browser (preferably Chrome. Don't use Explorer.)

Step 4

  • For the address, type localhost:8787.
  • localhost is how a browser points to a site within the same computer.
  • 8787 is a port (Think of it this way: The RStudio server has been told to seen all the data and messages to port 8787. Port 8787 is an arbitrary number essentially but it's agreed upon between the web browser and RStudio server. It's like assigning an airplane to a gate at the airport. This way departing passengers know where to wait, and arriving passengers know where to get off and be processed by immigration.)

Cloning via RStudio & GIT (1)

  • You should only do this “cloning” process once, now.
  • We will provide you additional instructions as to how to smoothly update these files without re-cloning.

Step 1

  • Initiate an RStudio session using the rstudio.cs.mcgill.ca as above.

Step 2

  • Go to Tools/Shell.
  • This creates a pop-up window: a Unix shell for your account on rstudio.cs.mcgill.ca.
  • Execute a pwd command to see where you are.
  • (You are likely at your home directory \( \tilde{} \).)

Cloning via RStudio & GIT (2)

Step 3

git clone /repo/COMP364/2015/Winter/student-repo.git ~/cs364

When this finishes execution, a copy (clone) of all the files that Dani and I created for the course will be in a directory in your home (~). You can read, write, modify, add, and delete these files as you want.

Cloning via RStudio & GIT (3)

Step 4

ls
total 8
drwxr-xr-x 3 hallett nogroup 3 Jan 11 13:41 R
drwxr-xr-x 8 hallett nogroup 9 Jan 12 14:10 cs364

So there are now two directories… the newest is cs364 that you just cloned. The other was created by R for its own internal purposes. Ignore it.

cd cs364
Change directory (cd) into the cs364 directory that you just cloned.

Cloning the course files via RStudio & GIT (4)

ls
total 17
-rw-r–r– 1 hallett nogroup 91 Jan 12 14:10 README
drwxr-xr-x 2 hallett nogroup 3 Jan 12 14:10 assignments
drwxr-xr-x 2 hallett nogroup 8 Jan 12 14:10 data
drwxr-xr-x 2 hallett nogroup 3 Jan 12 14:10 experiments
drwxr-xr-x 2 hallett nogroup 3 Jan 12 14:10 lectures
drwxr-xr-x 2 hallett nogroup 4 Jan 12 14:10 src

Change directory (cd) into the cs364 directory that you just cloned.

cat README

The cat (concatenate) commands allows you to look at the contents of a text file.

README files are sort of standard for providing users in Unix with important information about files.

Cloning via RStudio & GIT (5)

cd lectures
ls -l

But it's easier to examine these files using the bottom-right RStudio window (Files).

RStudio

Creating a Project within RStudio

  • To make the connection between our (Dani and my) cs364 repository and your clone of it, first make an RStudio project.
  • In RStudio, choose File/New Project/Existing Directory/Browse.
  • Now select your cloned directory cs364.
  • In the Top-Right RStudio window, select Git.
  • Any files you change, will be highlighted.
  • If you select these files, and then hit Commit, your files will be securely saved.

"Pulling" updates of the course files via GIT (1)

The nice thing about using GIT is that it allows us to smoothly update, fix and modify the course notes, data and R scripts.

Step 1

  • Initiate an RStudio session using the rstudio.cs.mcgill.ca as above.

Step 2 (simple)

  • Assuming you have cloned the cs364 directory and you have made an Rproj (see above), select the Top-Right RStudio window.
  • Choose the Git panel and hit the Pull button.

"Pulling" updates of the course files via GIT (2)

Step 2 (more complicated)

  • Go to Tools/Shell.
  • cd ~/cs364 or wherever you created your cs364 directory.
  • git pull

In other words, cd (change directory) to the place where you initially cloned the student cs364 directory (the previous slides suggest ~/cs364). The pull commands “pulls” all the updates, modifications and additions that Dani or I have made and merges them with your file, seemlesssly.

RStudio Problems

  • If RStudio hangs, trying kill the (underlying) R session first (Session/Terminate R).
  • Otherwise,
    • log onto the rstudio.cs.mcgill.ca server with your SOCS account (see above)
    • top
    • type u <username>
    • find the processes that start ssh
    • kill -9 <process id>
    • Back on your computer, re-do the ssh tunnel command from above,
      ssh -f <ur socs username>@rstudio.cs.mcgill.ca -L 8787:rstudio.cs.mcgill.ca:8787 -N
    • then reload RStudio in your browser,

RStudio Problems (2)

  • Otherwise otherwise,
    • kill your browser and start up again,
    • then throw your computer around the room.

Your SOCS account at the RStudio.cs.mcgill.ca

Step 1 [Mac or other Unix machine]

  • On a Mac, open a terminal window (look under Applications).

Step 2 [Mac or Unix]

  • Create a “secure shell” to log into the rstudio.cs server.

ssh -l <ur socs username> rstudio.cs.mcgill.ca

(You will be prompted for your password.)

  • The Unix of RStudio is just like the Unix of your Mac for all intents and purposes.
  • See above for general Unix commands.

Some Basic Unix (1)

  • Log into your account on rstudio.cs (see above).
  • You can also do Tools/shell within RStudio.

cd
cd = “change directory”. When you type this alone, you are sent back to your home.

cd ~
The tilde is a symbol meaning your come. Executing “cd” and “cd ~” are the same.

ls
cs364 R
“ls = list. This lists all the files & directories in your currently location.

Some Basic Unix (2)

cd ~/cs364
This moves you down one level of the tree into your cs364 directory.

ls -l
total 19
drwxr-xr-x 2 hallett nogroup 3 Jan 12 14:10 assignments
-rw-r--r-- 1 hallett nogroup 205 Jan 12 14:29 cs364.Rproj
drwxr-xr-x 2 hallett nogroup 8 Jan 12 14:10 data
drwxr-xr-x 2 hallett nogroup 3 Jan 12 14:10 experiments
drwxr-xr-x 3 hallett nogroup 5 Jan 12 14:25 lectures
-rw------- 1 hallett nogroup 0 Jan 12 14:57 my.new.lecture
-rw-r--r-- 1 hallett nogroup 91 Jan 12 14:10 README
drwxr-xr-x 2 hallett nogroup 4 Jan 12 14:10 src
-l is called a flag. In this case, the ls cmd gives you more information and you can distinguish between directories and files.

cat README
This is the course repository. Each directory contains a brief description of its purpose.
cat = concatenate. Print the contents of a text file.

Some Basic Unix (3)

cd lectures
This moves you down another level of the tree.

pwd
/home/2015/hallett/cs364/lectures
print working directory. This tells you where you are in your tree.

touch my.new.lecture
touch creates an empty file with the given name.

Some Basic Unix (4)

ls -l
total 18959
-rw-r--r-- 1 hallett nogroup 48 Jan 12 14:10 lectures-readme.txt
drwx------ 3 hallett nogroup 5 Jan 12 14:23 M1.L1
-rw------- 1 hallett nogroup 19319296 Jan 12 14:24 M1.L2.ppt
-rw------- 1 hallett nogroup 0 Jan 12 2015 my.new.lecture

Some Basic Unix (5)

mv my.new.lecture ~/my.new.lecture.moved
mv = move. This moves the file “my.new.lecture” to a new location (my home directory ~) and a new name “my.new.lecture.moved”.

ls -l
total 18958
-rw-r--r-- 1 hallett nogroup 48 Jan 12 14:10 lectures-readme.txt
drwx------ 3 hallett nogroup 5 Jan 12 14:23 M1.L1
-rw------- 1 hallett nogroup 19319296 Jan 12 14:24 M1.L2.ppt

Some Basic Unix (6)

cd ~
Let's move back to the top where we moved the file.

ls -l
total 8
drwxr-xr-x 9 hallett nogroup 13 Jan 12 14:57 cs364
-rw------- 1 hallett nogroup 0 Jan 12 15:12 my.new.lecture.moved
drwxr-xr-x 3 hallett nogroup 3 Jan 11 13:41 R
-rw-r--r-- 1 hallett nogroup 5 Jan 11 14:02 tmp.R

cp my.new.lecture.moved didntmove.again
cp=copy. It makes a copy (but doesn't destroy the original) of a file. The second argument is the name of the file.

Some Basic Unix (7)

ls -l
total 9
drwxr-xr-x 9 hallett nogroup 13 Jan 12 14:57 cs364
-rw------- 1 hallett nogroup 0 Jan 12 2015 didntmove.again
-rw------- 1 hallett nogroup 0 Jan 12 15:12 my.new.lecture.moved
drwxr-xr-x 3 hallett nogroup 3 Jan 11 13:41 R
-rw-r--r-- 1 hallett nogroup 5 Jan 11 14:02 tmp.R

Some Basic Unix (8)

rm my.new.lecture.moved
rm=remove. This removes the specified file. Unix might prompt you to be sure.

ls -l
total 8
drwxr-xr-x 9 hallett nogroup 13 Jan 12 14:57 cs364
-rw------- 1 hallett nogroup 0 Jan 12 15:16 didntmove.again
drwxr-xr-x 3 hallett nogroup 3 Jan 11 13:41 R
-rw-r--r-- 1 hallett nogroup 5 Jan 11 14:02 tmp.R

Some Basic Unix (9)

cd ~
cp -r cs364 cs364.backup

The -r flag means “recursive”: Copy cs364 and all the files/directories within cs364 and all the files/directories within each directory within cs364, etc. etc. etc.

ls -l
total 11
drwxr-xr-x 9 hallett nogroup 13 Jan 12 14:57 cs364
drwx------ 9 hallett nogroup 13 Jan 12 2015 cs364.backup
-rw------- 1 hallett nogroup 0 Jan 12 15:16 didntmove.again
drwxr-xr-x 3 hallett nogroup 3 Jan 11 13:41 R
-rw-r--r-- 1 hallett nogroup 5 Jan 11 14:02 tmp.R

Some Basic Unix (10)

rm -r cs364.backup
rm: descend into directory ‘cs364.backup/’? y
rm: descend into directory ‘cs364.backup/src’? y
rm: remove regular file ‘cs364.backup/src/src-readme.txt’?
The -r flag again means “recursive”. but this will take a long time…

rm -r -f cs364.backup
The -f flag again means force the remove (don't ask for permission).

Some Basic Unix (11)

top
top - 16:04:53 up 7 days, 1:57, 3 users, load average: 0.00, 0.02, 0.05
KiB Mem: 8176704 total, 2137472 used, 6039232 free, 165264 buffers
KiB Swap: 8388604 total, 0 used, 8388604 free. 1660936 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

1 root 20 0 35592 2904 1480 S 0.0 0.0 0:02.52 init

2 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kthreadd

3 root 20 0 0 0 0 S 0.0 0.0 0:00.52 ksoftirqd/0

5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H

7 root 20 0 0 0 0 S 0.0 0.0 0:21.18 rcusched


_All the processes running on the machine. Everyone's.

Some Basic Unix (12)

top
u hallett
top - 16:04:53 up 7 days, 1:57, 3 users, load average: 0.00, 0.02, 0.05
Tasks: 120 total, 1 running, 119 sleeping, 0 stopped, 0 zombie
... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

2307 hallett 20 0 124732 2040 1000 S 0.0 0.0 0:00.09 sshd

2308 hallett 20 0 33600 4144 1976 S 0.0 0.1 0:00.08 bash
Just my processes.

kill -9 2307
Kills my process with PID (process ID) 2307.

COMP-364 (c) M Hallett, BCI-McGill

BCI-McGill